multilingual machine translation
FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation
Zhu, Shaolin, Dong, Tianyu, Li, Bo, Xiong, Deyi
In this paper, we present FuxiMT, a novel Chinese-centric multilingual machine translation model powered by a sparsified large language model (LLM). We adopt a two-stage strategy to train FuxiMT. We first pre-train the model on a massive Chinese corpus and then conduct multilingual fine-tuning on a large parallel dataset encompassing 65 languages. FuxiMT incorporates Mixture-of-Experts (MoEs) and employs a curriculum learning strategy for robust performance across various resource levels. Experimental results demonstrate that FuxiMT significantly outperforms strong baselines, including state-of-the-art LLMs and machine translation models, particularly under low-resource scenarios. Furthermore, FuxiMT exhibits remarkable zero-shot translation capabilities for unseen language pairs, indicating its potential to bridge communication gaps where parallel data are scarce or unavailable.
Multilingual Machine Translation with Quantum Encoder Decoder Attention-based Convolutional Variational Circuits
Dikshit, Subrit, Tiwari, Ritu, Jain, Priyank
In the 2000s, artificial intelligence and deep learning - based systems became prevalent and took over the world by storm . Many modern multilingual state - of - the - art [ 1 ] networks and cloud - based translation services like Google Translate, Microsoft Translator, ChatGPT [ 2 ], DeepSeek [ 3 ] emerged and became available during this era . These Multilingual Large Language Networks are architected around Gated Recurrent Unit Networks ( GRU) [ 4 ], Long Short - Term Memory ( LSTM) [ 5 ], Bidirectional Encoder Representations from Transformers ( BERT) [ 6 ], Generative pre - trained transformer ( GPT) [ 7 ], Text - to - Text Transfer Transformer ( T5) [ 8 ] and similar attention - based transformers [ 9 ] networks with finer and improve d amendments to architectures. W hile m ost academicians, researchers, and organisations focused on these classical computing realm aspects and less emphasis was put on multilingual machine tr a nslation in the quantum computing realm . S ome practitioners and scholars who emphasis ed quantum computing for machine tr a nslation and their associated works are discussed in the Related Works section later. However, these researches under - u tilize d simulat ion and execution on quantum computing hardware along with under - exploit ing the novel perceptions of quantum convolution [ 10 ], quantum pooling [ 11 ], quantum variational circuit [ 12 ] and quantum attention [ 13 ] as quantum - based software amendments that are studie d, demonstrate d and stunned as shortcomings in QEDACVC system .
PMMT: Preference Alignment in Multilingual Machine Translation via LLM Distillation
Sun, Shuqiao, Yao, Yutong, Wu, Peiwen, Jiang, Feijun, Zhang, Kaifu
Translation is important for cross-language communication, and many efforts have been made to improve its accuracy. However, less investment is conducted in aligning translations with human preferences, such as translation tones or styles. In this paper, a new method is proposed to effectively generate large-scale multilingual parallel corpora with specific translation preferences using Large Language Models (LLMs). Meanwhile, an automatic pipeline is designed to distill human preferences into smaller Machine Translation (MT) models for efficiently and economically supporting large-scale calls in online services. Experiments indicate that the proposed method takes the lead in translation tasks with aligned human preferences by a large margin. Meanwhile, on popular public benchmarks like WMT and Flores, on which our models were not trained, the proposed method also shows a competitive performance compared to SOTA works.
Massively Multilingual Text Translation For Low-Resource Languages
Translation into severely low-resource languages has both the cultural goal of saving and reviving those languages and the humanitarian goal of assisting the everyday needs of local communities that are accelerated by the recent COVID-19 pandemic. In many humanitarian efforts, translation into severely low-resource languages often does not require a universal translation engine, but a dedicated text-specific translation engine. For example, healthcare records, hygienic procedures, government communication, emergency procedures and religious texts are all limited texts. While generic translation engines for all languages do not exist, translation of multilingually known limited texts into new, low-resource languages may be possible and reduce human translation effort. We attempt to leverage translation resources from rich-resource languages to efficiently produce best possible translation quality for well known texts, which are available in multiple languages, in a new, low-resource language. To reach this goal, we argue that in translating a closed text into low-resource languages, generalization to out-of-domain texts is not necessary, but generalization to new languages is. Performance gain comes from massive source parallelism by careful choice of close-by language families, style-consistent corpus-level paraphrases within the same language and strategic adaptation of existing large pretrained multilingual models to the domain first and then to the language. Such performance gain makes it possible for machine translation systems to collaborate with human translators to expedite the translation process into new, low-resource languages.
Viewing Knowledge Transfer in Multilingual Machine Translation Through a Representational Lens
Stap, David, Niculae, Vlad, Monz, Christof
We argue that translation quality alone is not a sufficient metric for measuring knowledge transfer in multilingual neural machine translation. To support this claim, we introduce Representational Transfer Potential (RTP), which measures representational similarities between languages. We show that RTP can measure both positive and negative transfer (interference), and find that RTP is strongly correlated with changes in translation quality, indicating that transfer does occur. Furthermore, we investigate data and language characteristics that are relevant for transfer, and find that multi-parallel overlap is an important yet under-explored feature. Based on this, we develop a novel training scheme, which uses an auxiliary similarity loss that encourages representations to be more invariant across languages by taking advantage of multi-parallel data. We show that our method yields increased translation quality for low- and mid-resource languages across multiple data and model setups.
Mitigating Data Imbalance and Representation Degeneration in Multilingual Machine Translation
Lai, Wen, Chronopoulou, Alexandra, Fraser, Alexander
Despite advances in multilingual neural machine translation (MNMT), we argue that there are still two major challenges in this area: data imbalance and representation degeneration. The data imbalance problem refers to the imbalance in the amount of parallel corpora for all language pairs, especially for long-tail languages (i.e., very low-resource languages). The representation degeneration problem refers to the problem of encoded tokens tending to appear only in a small subspace of the full space available to the MNMT model. To solve these two issues, we propose Bi-ACL, a framework that uses only target-side monolingual data and a bilingual dictionary to improve the performance of the MNMT model. We define two modules, named bidirectional autoencoder and bidirectional contrastive learning, which we combine with an online constrained beam search and a curriculum learning sampling strategy. Extensive experiments show that our proposed method is more effective both in long-tail languages and in high-resource languages. We also demonstrate that our approach is capable of transferring knowledge between domains and languages in zero-shot scenarios.
UvA-MT's Participation in the WMT23 General Translation Shared Task
Wu, Di, Tan, Shaomu, Stap, David, Araabi, Ali, Monz, Christof
This paper describes the UvA-MT's submission to the WMT 2023 shared task on general machine translation. We participate in the constrained track in two directions: English <-> Hebrew. In this competition, we show that by using one model to handle bidirectional tasks, as a minimal setting of Multilingual Machine Translation (MMT), it is possible to achieve comparable results with that of traditional bilingual translation for both directions. By including effective strategies, like back-translation, re-parameterized embedding table, and task-oriented fine-tuning, we obtained competitive final results in the automatic evaluation for both English -> Hebrew and Hebrew -> English directions.
Serial or Parallel? Plug-able Adapter for multilingual machine translation
Zhu, Yaoming, Feng, Jiangtao, Zhao, Chengqi, Wang, Mingxuan, Li, Lei
Developing a unified multilingual translation model is a key topic in machine translation research. However, existing approaches suffer from performance degradation: multilingual models yield inferior performance compared to the ones trained separately on rich bilingual data. We attribute the performance degradation to two issues: multilingual embedding conflation and multilingual fusion effects. To address the two issues, we propose PAM, a Transformer model augmented with defusion adaptation for multilingual machine translation. Specifically, PAM consists of embedding and layer adapters to shift the word and intermediate representations towards language-specific ones. Extensive experiment results on IWSLT, OPUS-100, and WMT benchmarks show that \method outperforms several strong competitors, including series adapter and multilingual knowledge distillation.